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1  Summary  of  program  objectives  and  outcomes 

The  goal  of  this  work  was  to  learn  and  exploit  unknown  spatio-temporal  structure  in  online  photon- limited 
sensing  and  surveillance  data.  Photon-limited  imaging  arises  in  a  wide  variety  of  applications  of  interest 
to  the  Air  Force,  including  night  vision,  space  weather,  imaging  through  fog,  and  spectral  imaging.  The 
photon-limited  video  reconstruction  problem  is  particularly  challenging  because  (a)  the  limited  number  of 
available  photons  introduces  intensity-dependent  Poisson  statistics  which  require  specialized  algorithms  and 
analysis  for  optimal  performance,  (b)  vast  quantities  of  video  data  will  be  collected  sequentially,  necessi¬ 
tating  fast  online  algorithms,  and  (c)  unknown  and  changing  environmental  dynamics  preclude  classical 
methods  based  on  known  dynamical  models.  Many  current  systems  sidestep  photon  limitations  by  arti¬ 
ficially  restricting  the  frame  rate  and  resolution  of  the  video,  but  sophisticated  statistical  methods  allow 
dramatic  increases  in  resolution  and  improved  object  identification  and  detection  capabilities. 

We  addressed  these  challenges  by  developing  new  tools  for  learning  and  exploiting  low-dimensional 
signal  structured,  including  sparsity  and  low-rank  structure,  from  photon- limited  data.  In  addition,  we  de¬ 
veloped  novel  online  learning  methods  that  would  allow  large-scale  photon- limited  video  data  to  be  analyzed 
as  it  was  collected  (as  opposed  to  forensic  analysis  with  a  considerable  time  delay).  More  specifically,  there 
were  four  main  outcomes  from  this  work: 

•  Improved  understanding  of  the  fundmental  limitations  of  compressed  sensing  (CS)  for  photon- 
limited  imaging.  Several  engineers  and  scientists  from  the  optics  and  signal  processing  communities 
have  suggested  that  we  design  novel  cameras  for  photon- limited  settings  based  on  the  principles  of  CS. 
Most  prior  theoretical  results  in  compressed  sensing  and  related  inverse  problems  apply  to  idealized 
settings  where  the  noise  is  i.i.d.,  and  do  not  account  for  signal-dependent  noise  and  physical  sensing 
constraints.  Prior  results  on  Poisson  compressed  sensing  with  signal-dependent  noise  and  physical 
constraints  in  [16]  provided  upper  bounds  on  mean  squared  error  performance  for  a  specific  class  of 
estimators.  However,  it  was  unknown  whether  those  bounds  were  tight  or  if  other  estimators  could 
achieve  significantly  better  performance.  Our  work  provided  minimax  lower  bounds  on  mean-squared 
error  for  sparse  Poisson  inverse  problems  under  physical  constraints,  and  demonstrate  the  CS  is  not  a 
viable  paradigm  for  photon-limited  sensing  and  surveillance.  For  additional  details,  see 

X.  Jiang,  G.  Raskutti,  and  R.  Willett,  “Minimax  optimal  rates  for  Poisson  inverse  prob¬ 
lems  with  physical  constraints”,  accepted  to  IEEE  Transactions  on  Information  Theory, 
arXiv:1403:6532, 2014. 

•  Novel  method  for  photon-limited  signal  denoising  which  represents  the  current  state-of-the- 
art.  We  have  developed  denoising  algorithms  for  photon- limited  images  which  combine  elements  of 
dictionary  learning  and  sparse  representations  for  image  patches  [17].  Our  preliminary  method  em¬ 
ploys  both  an  adaptation  of  Principal  Component  Analysis  (PCA)  for  Poisson  noise  and  our  sparsity 
regularized  convex  optimization  algorithms  for  photon-limited  images.  A  comprehensive  empirical 
evaluation  of  the  proposed  method  reveals  that,  despite  its  simplicity,  PC  A- flavored  denoising  appears 
to  be  highly  competitive  in  very  low  light  regimes,  as  depicted  in  Figure  2.  In  this  figure,  we  com¬ 
pare  with  BM3D,  widely  considered  to  be  the  current  state-of-the-art  for  image  denoising,  and  other 
widely-used  Poisson  image  denoising  methods.  For  more  details,  see 

J.  Salmon,  Z.  Harmany,  C.  Deledalle,  and  R.  Willett,  “Poisson  noise  reduction  with  non¬ 
local  PCA,”  Journal  of  Mathematical  Imaging  and  Vision,  vol.  48,  no.  2,  pp.  279-294, 
arXiv:1206:0338, 2014. 

•  Reparameterizations  of  photon-limited  images.  Most  photon- limited  image  reconstruction  meth¬ 
ods  optimize  a  regularized  negative  Poisson  log  likelihood  to  estimate  the  underlying  image  intensity. 
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However,  at  very  low  intensities,  this  focus  on  the  image  intensity  leads  to  several  technical  chal¬ 
lenges  related  to  efficient  optimization  tools,  cross-validation  techniques,  and  empirical  performance. 
To  combat  this,  we  introduced  a  novel  approach  for  Poisson  image  reconstruction  that  adapts  to  the 
signal  intensity  level  through  a  hybrid  objective  function  with  useful  properties.  This  method  per¬ 
forms  well  visually  and  empirically,  and  outperforms  prior  models  in  terms  of  RMSE.  Our  method  is 
also  amenable  to  cross-validation  for  selecting  model  parameters  accurately. 

A.  K.  Oh,  Z.  T.  Harmany,  and  R.  M.  Willett,  “To  e  or  not  to  e  in  Poisson  image  recon¬ 
struction”,  Proceedings  of  the  IEEE  International  Conference  on  Image  Processing  (ICIP), 

2014.  Received  award  as  “Top  10%  Paper”. 

A  journal  version  of  this  work  is  in  progress. 

•  New  approaches  for  online  photon-limited  video  reconstruction  and  analysis.  With  streaming 
photon-limited  video,  it  is  possible  to  compute  more  accurate  reconstructions  by  exploiting  not  only 
spatial  structured,  as  describe  above,  but  also  temporal  structure.  Such  reconstructions  are  essential 
for  subsequent  analysis,  such  as  foreground  and  background  separation.  However,  standard  stochastic 
filtering  methods  (like  Kalman  or  particle  filters)  are  ill-suited  for  this  regime.  We  developed  novel 
online  learning  methods  which  are  capable  not  only  of  accounting  for  photon  limitations,  but  also 
learn  and  exploit  the  underlying  scene  dynamics.  The  supporting  theory  is  a  unique  contribution  to  the 
online  learning  literature,  while  the  algorithms  are  fast  and  produce  state-of-the-art  reconstructions. 

E.  Hall  and  R.  Willett,  “Foreground  and  background  reconstruction  in  Poisson  video”, 
Proceedings  of  the  IEEE  International  Conference  on  Image  Processing  (ICIP),  2013. 

A  journal  version  of  this  work  is  in  progress  and  is  detailed  below. 


2  Relationship  between  program  outcomes  and  previous  state-of-the-art 

The  four  major  outcomes  all  represent  advances  on  the  previous  state-of-the-art. 


Photon-limited  CS.  The  bounds  on  compressed  sensing  for  photon-limited  imaging  are  the  first  known 
lower  bounds  for  this  problem  and  highlight  the  practical  challenges  of  using  CS  for  night  vision  in  US  AE 
equipment.  An  example  of  this  is  depicted  in  Eigure  1.  Specifically,  we  consider  images  which  are  s- 
sparse  in  a  wavelet  basis,  and  of  the  s  non-zero  wavelet  coefficients  are  at  a  coarse  scale.  (In  many  natural 
images,  is  large  relative  to  s.)  The  conventional  wisdom  of  CS  tells  us  that  CS  is  preferable  to  simply  using 
a  low-resolution  imager  -  i.e.,  downsampling  the  scene  (DS)  -  because  CS  will  recover  both  the  coarse- 
scale  coefficients  and  the  s  fine-scale  coefficients,  while  a  low-resolution  imager  only  allow  recovery  of 
the  coarse-scale  coefficients.  Our  theory  and  supporting  simulations  demonstrate  that  this  conventional 
wisdom  is  incorrect  in  high- noise,  photon-limited  regimes,  and  DS  can  significantly  outperform  CS. 


Non-local  PCA  for  photon-limited  image  estimation.  As  described  in  the  previous  section,  our  method 
for  photon-limited  image  estimation,  which  leverages  ideas  from  dictionary  learning  and  sparse  recovery 
methods,  represents  the  current  state-of-the-art  in  photon- limited  imaging.  Empirical  results  shown  in  Eig¬ 
ure  2  highlight  the  improvements  possible  via  the  work  supported  by  this  grant. 
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(a)  Theoretical  rates 


(b)  Empirical  rates 


Figure  1:  Theoretical  and  empirical  rates  of  downsampling  and  compressed  sensing  methods.  Plots  cor¬ 
respond  to  imaging  a  scene  with  2048  pixels  using  512  measurements  when  the  scene  as  s  =10  non-zero 
wavelet  coefficients,  is  the  number  of  coarse-scale  nonzero  coefficients  which  are  directly  measured  by 
the  proposed  downsampling  scheme.  We  see  that  at  low-intensities,  downsampling  can  yield  much  lower 
MSEs,  but  after  the  intensity  exceeds  a  critical  threshold,  compressed  sensing  methods  are  able  to  estimate 
all  nonzero  coefficients  accurately  and  the  MSE  is  better  than  for  downsampling  schemes.  This  effect  is 
predicted  by  our  theory. 


(a)  Original 


(b)  Noisy,  PSNR=-7.11 


(c)  Multi  scale  partition, 
PSNR=10.97 


(d)  BM3D, 

PSNR=12.92 


(e)  NLPCASbin, 
PSNR=15.99 


Eigure  2:  Simulated  images  (flag  in  top  row,  ridges  in  bottom  row)  corrupted  with  Poisson  noise  with  peak  intensity 
of  0. 1 .  Our  method  is  NLPCASbin,  and  its  result  is  a  notable  improvement  over  the  previous  state  of  the  art. 


Reparameterizations  of  photon-limited  images.  The  reparameterization  of  photon-limited  images  al¬ 
lows  fast  and  accurate  recovery  with  empirical  advantages  over  classical  reconstruction  approaches.  This 
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(a)  Truth  scene  inten-  (b)  Photon-limited  ob-  (c)  Classical  method  (d)  Recovery  using 
sity  servations  recovery  using  total-  total-variation  regu- 

variation  regularized  larization  with  our 
Poisson  log-likelihood  reparameterization 

Figure  3:  Denoising  results  using  our  reparameterization  method. 


is  illustrated  in  Figure  3,  where  we  recover  an  image  from  photon-limited  observations  using  (c)  classical 
total- variation  regularization  of  a  negative  Poisson  log-likelihood  and  (d)  total- variation  regularization  ap¬ 
plied  to  our  our  reparameterization  of  the  Poisson  log-likelihood.  The  primary  advance  over  previous  work 
is  a  new  perspective  on  how  regularizers  can  and  should  be  selected  in  photon- limited  imaging.  The  classi¬ 
cal  approach  is  to  determine  from  the  outset  that  the  scene  will  be  parameterized  by  a  linear  function  of  its 
pixel  intensity  values,  and  then  choose  a  convex  regularization  function  that  will  facilitate  using  off-the-shelf 
convex  optimization  tools  to  compute  an  image  estimate.  In  contrast,  our  approach  is  far  more  flexible  in 
that  regularizers  can  be  applied  to  non-linear  transformation  of  the  image  (e.g.,  its  logarithm),  giving  us  ac¬ 
cess  to  a  much  wider  array  of  potential  regularizers  that  can  be  used  with  convex  optimization  tools.  These 
regularizers  can  then  be  used  to  improve  photon- limited  image  reconstruction. 


Online  reconstruction  of  streaming  photon-limited  video.  Our  methods  for  reconstructing  photon- 
limited  video,  particularly  separating  moving  foreground  and  background  from  low-SNR  data,  is  novel 
and  an  advance  over  the  previous  state-of-the-art  in  several  respect.  Classical  stochastic  filtering  methods 
such  as  Kalman  or  particle  filters  or  Bayesian  updates  [2]  readily  exploit  dynamical  models  for  effective  pre¬ 
diction  and  tracking  performance.  However,  classical  methods  are  also  limited  in  their  applicability  because 
(a)  they  typically  assume  an  accurate,  fully  known  dynamical  model  and  (b)  they  rely  on  strong  assumptions 
regarding  a  generative  model  of  the  observations.  Some  techniques  have  been  proposed  to  learn  the  dynam¬ 
ics  [20,  21],  but  the  underlying  model  still  places  heavy  restrictions  on  the  nature  of  the  data.  The  Kalman 
filter  relies  on  Gaussian  noise  models,  and  particle  filters  exhibit  particle  degeneracy,  making  them  difficult 
to  use  in  practical  settings. 

A  contrasting  class  of  prediction  methods  is  based  on  an  “individual  sequence”  or  “universal  predic¬ 
tion”  [13]  perspective;  these  strive  to  perform  provably  well  on  any  individual  observation  sequence.  In 
particular,  online  convex  programming  methods  [4,  7,  14,  23]  rely  on  the  gradient  of  the  instantaneous 
loss  of  a  predictor  to  update  the  prediction  for  the  next  data  point.  The  aim  of  these  methods  is  to  ensure 
that  the  per-round  performance  approaches  that  of  the  best  offline  method  with  access  to  the  entire  data  se¬ 
quence.  This  approach  allows  one  to  sidestep  challenging  issues  associated  with  statistically  dependent  or 
non- stochastic  observations,  misspecified  generative  models,  and  corrupted  observations.  This  framework 
is  limited  as  well,  however,  because  performance  bounds  are  typically  relative  to  either  static  or  piecewise 
constant  comparators  and  do  not  adequately  reflect  adaptivity  to  a  dynamic  environment. 

Our  approach  is  novel  framework  for  prediction  in  the  individual  sequence  setting  which  incorporates 
dynamical  models  effectively  a  novel  combination  of  state  updating  from  stochastic  filter  theory  and  online 
convex  optimization  from  universal  prediction.  This  framework,  and  its  application  to  photon- limited  video 
reconstruction,  is  detailed  in  the  next  section. 
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3  Online  Foreground  and  Background  Reconstruction  in  Poisson  Video 


Many  imaging  applications  such  as  night  vision,  infrared  imaging,  and  certain  astronomical  imaging  systems 
are  characterized  by  limited  amounts  of  available  light.  In  these  and  other  settings,  the  goal  is  to  reconstruct 
spatially  distributed  and  dynamic  phenomena  from  data  collected  by  counting  discrete  independent  events, 
such  as  photons  hitting  a  detector.  More  specifically,  we  can  model  our  observations  at  time  t  as 

yt  ^Po\sson[At),  (1) 

where  yt  is  the  vector  of  photon  counts  across  n  detectors  and  At  is  the  intensity  of  interest  (i.e., 
the  n-pixel  scene)  [18]. 

We  are  interested  in  the  case  where  At  has  two  components:  a  dynamic  foreground  (pt  which  occupies  a 
relatively  small  portion  of  the  scene,  and  a  static  or  slowly- varying  background  Pt,  so  that 

At  =(pt  +Pt- 


The  goal  here  is  to  recover  an  accurate  estimate  of  cpt  and  Pt  from  yt,  especially  when  the  photon  counts 
are  very  low  and  when  the  underlying  dynamics  are  unknown. 

There  exists  a  rich  literature  on  image  estimation  and  background  subtraction  methods,  and  a  wide  vari¬ 
ety  of  effective  tools  in  high  SNR  regimes.  For  instance,  a  common  method  for  object  tracking  is  to  form  an 
estimate  of  the  background  scene,  and  subtract  this  from  the  observation  to  get  an  estimate  for  the  foreground 
[15].  Many  of  these  methods  make  the  assumption  that  the  observed  pixel  values  are  the  true  scene  corrupted 
with  white  Gaussian  distributed  around  the  true,  slowly  varying  background  mean  [19],  which  is  untrue  both 
by  the  Poisson  observation  model  and  settings  with  dynamic  backgrounds.  Alternatively,  another  technique 
is  to  learn  and  track  a  low-dimensional  subspace  representation  of  the  background  [3].  While  such  a  method 
can  be  modified  for  the  Poisson  setting,  simply  subtracting  this  background  estimate  from  the  observations 
will  still  not  yield  an  accurate  foreground  estimate  in  the  low-light  setting.  In  fact,  even  if  the  background 
were  known  exactly,  subtraction  will  not  give  a  very  accurate  estimate  of  the  foreground,  as  shown  in  Figure 
4. 


(a)  Known  Background 


(b)  True  Scene 


(c)  Poisson  Observation  (d)  Observation  minus 
true  background 


Figure  4:  Challenges  of  background  subtractions  for  photon-limited  video.  The  background  (a)  and  a  fore¬ 
ground  object  in  the  top  right  corner  form  the  true  scene  (b).  Poisson  observations  are  then  collected  from 
the  true  scene  (c).  Even  if  the  background  was  known  exactly  and  subtracted  from  the  observations,  the 
resulting  image  (d)  is  still  very  noisy,  making  accurate  inference  about  foreground  objects  challenging. 


The  photon- limited  image  estimation  problem  is  particularly  challenging  because  it  introduces  intensity- 
dependent  Poisson  statistics  which  require  specialized  algorithms  and  analysis  for  optimal  performance. 
Simply  transforming  Poisson  data  to  produce  data  with  approximately  Gaussian  noise  (via,  for  instance,  the 
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variance  stabilizing  Anscombe  transform  [1,  12]  or  Fisz  transform  [9,  10])  can  be  effective  when  the  number 
of  counts  is  sufficiently  high  [6,  22].  However,  applying  these  methods  to  foreground  estimation  is  a  difficult 
problem  due  to  the  non-linearities  induced  by  the  transforms.  Specifically,  these  tools  may  make  it  possible 
to  estimate  At,i  effectively,  but  the  inverse  problem  of  estimating  (pt  and  Pt  is  significantly  more  challenging 
because  of  the  nonlinear  relationship  between  the  unknowns  and  variance  stabilized  observations. 

In  addition,  the  dynamic  setting  presents  significant  opportunity  for  improved  photon-limited  surveil¬ 
lance.  Consider  the  case  in  which  the  temporal  dynamics  are  known  exactly.  For  the  Gaussian  noise  setting, 
the  Kalman  filter  has  proved  enormously  effective.  The  known  dynamics  can  effectively  act  as  a  prior  prob¬ 
ability  model  for  the  scene  at  time  t,  and  once  yt  has  been  observed,  this  prior  knowledge  can  dramatically 
improve  reconstruction  accuracy  even  when  the  number  of  available  photons  is  small. 

Generalizations  of  this  approach  to  Poisson  noise  are  possible  with  particle  filters  [2],  but  particle  degen¬ 
eracy  is  a  major  practical  challenge.  Furthermore,  classical  stochastic  filtering  methods  typically  assume  an 
accurate,  fully  known  dynamical  model;  if  a  dynamical  model  is  learned  from  data,  it  is  typically  assumed 
not  to  change  over  time. 

We  present  an  online  method  which  estimates  the  underlying,  time- varying  dynamical  model,  and  uses 
this  estimate  to  generate  online  estimates  of  the  foreground  and  background  video  sequences.  Our  approach 
is  based  on  recent  advances  in  online  convex  programming  and  online  learning  [4,  7,  14,  23].  In  particular, 
we  use  a  variant  of  Mirror  Descent  [4,  14]  which  incorporates  dynamical  model  estimates  [11]. 

3. 1  Problem  F  ormulation 

We  model  the  data  as  Poisson  observations  of  a  scene  which  is  mostly  background  with  some  dynamic 
foreground.  In  order  to  distinguish  foreground  from  background,  we  assume  that  the  two  have  discernibly 
different  underlying  dynamics,  and  that  the  foreground  obscures  only  a  small  part  of  the  background.  We 
denote  the  observation  at  time  f  as  yt,  the  background  as  fit  and  the  foreground  as  cpt .  Therefore  the 
probability  density  function  of  the  observation  is  given  as: 

exp  +lit,i)  ■  (2) 

i=i 

Here,  t  indicates  time  index,  and  i  indicates  pixel  location.  Notice  that  this  model  assumes  that  the  observed 
scene  is  the  superposition  of  background  and  foreground  at  every  pixel.  In  actuality  every  pixel  would  either 
be  completely  foreground  or  completely  background,  but  it  is  difficult  to  model  this  explicitly  because  the 
locations  of  the  foreground  pixels  would  need  to  be  known  exactly  a  priori.  Using  this  model  we  wish  to 
reconstruct  (it  and  cpt  as  accurately  as  possible  in  a  time-efficient  manner. 

3.2  Dynamic  Fixed  Share  algorithm 

In  order  to  solve  the  problems  of  background  and  foreground  estimation,  we  will  use  an  algorithm  called 
Dynamic  Fixed  Share  (DFS)  [1 1].  In  this  section,  we  describe  the  DFS  method  in  a  general  setting,  and  it’s 
application  to  background  subtraction  problems  will  be  described  in  the  next  section. 

DFS  takes  in  streaming  observations  and  a  family  of  candidate  dynamic  models  to 

produce  a  sequence  of  estimates  (denoted  ft)  with  provably  low  loss.  Specifically,  at  time  t  we  make  an 
observation  yt,  and  it  induces  a  convex  loss  function 

HO)  =Me) 

where  ft  (6)  describes  how  well  a  candidate  estimate  0  fits  the  observation  yt  and  r  (0)  is  a  regularization 
function. 
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DFS  works  in  two  steps,  the  first  being  to  produce  an  estimate  for  each  candidate  dynamical  model 
at  each  time  step  in  the  following  way: 

ft-fi  =aTgmmrit{Vft[^),G}+ritr[G)  +D[G1^)  (3) 

(4) 


Here,  rit  is  a  step  size  parameter  and  is  3.  Bregman  Divergence.  These  equations  effectively  update 

the  previous  estimate  by  taking  a  step  in  the  direction  of  the  negative  gradient  of  ft,  while  also  ensuring 
that  the  new  estimate  is  well  regularized  and  close  to  the  previous  estimate.  Once  this  intermediate  estimate 
is  found  (3),  the  dynamical  model  is  applied  to  get  the  next  estimate  (4).  The  second  part  of  DFS  is  to 
produce  a  single  estimate  from  all  of  the  sub-estimates  produced  by  individual  dynamic  models.  It  does 
this  by  taking  a  weighted  average  of  the  sub-estimates,  with  weights  based  on  the  accumulated  loss  of  each 
candidate  model. 

We  characterize  the  performance  of  this  approach  via  a  regret  bound,  which  quantifies  the  difference 
between  the  accumulated  loss  of  our  method  and  the  accumulated  loss  of  any  comparator  sequence  Gt 
which  might  be  output  by  a  competing,  potentially  batch,  method.  It  is  shown  that  the  estimate  produced  by 
the  DFS  method  satisfies  the  following  regret  bound: 


T  T 

niin  ft{.Gt)^ 
t=l  0i,G2,...,6t  f=i 

of  Cm  +l)aogCiV)  +1) 

1 


+log 

^  iTh-/-!  tk+i-1 


+  min 


mm 


^2 . . ^1 


[ik) 

1Gt^i-(^  [Gt)1  , 


where  N  is  the  number  of  dynamic  models  considered,  m  is  the  maximum  amount  of  times  the  optimal 
dynamic  models  used  to  describe  the  comparator  sequence  can  switch,  and  a  is  a  parameter  used  in  the 
algorithm  between  0  and  1,  which  is  an  estimate  on  the  fraction  of  times  the  underlying  dynamic  model 
should  switch  (approximately  m/T).  The  final  line  of  the  bound  measures  how  well  the  comparator  se¬ 
quence,  Gi,G2,...,Gt  follows  the  dynamics  on  m  -^1  optimally  chosen  time  segments.  This  variation  term 
finds  the  best  dynamical  model  in  our  family  and  the  optimal  time  points  such  that  the  variation  term  is 
minimized.  This  means  that  if  the  comparator  sequence  can  be  appropriately  described  as  a  series  of  a  few 
subsequences  which  each  closely  follow  one  of  the  dynamical  models  ,  then  the  regret  bound  will  be  low. 
For  more  details  see  [1 1]. 

It  is  important  to  note  that  we  use  the  DFS  algorithm  for  the  background  instead  of  a  moving  average: 


pt= 


r/t-s 
s=l  ^ 


for  some  a  ^[0,1].  This  is  important  because  if  the  background  has  some  dynamic  motion,  the  moving 
average  would  perform  poorly.  If  o  were  set  too  low,  then  the  background  estimate  would  be  heavily 
corrupted  by  Poisson  noise  artifacts.  On  the  other  hand,  if  o  were  very  close  to  1,  the  motion  of  the 
background  would  cause  blur  in  the  estimate.  Even  if  o  is  chosen  in  between  these  two  extremes,  the 
estimate  would  not  reflect  the  true  background  very  well  as  shown  in  figure  5. 
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Figure  5:  Absolute  difference  between  moving  average  and  true  background  with  a  =.99.  The  true  back¬ 
ground  has  max  value  of  5,  meaning  the  errors  are  relatively  large.  Notice  that  this  image  contains  both 
errors  at  the  leading  edge  due  to  motion,  and  noise  errors  from  the  observation  model.  Both  of  these  errors 
would  adversely  affect  the  foreground  estimation  performance 


3.3  Method 


Our  first  step  is  to  find  an  estimate  the  background,  so  we  must  first  find  a  loss  function  for  estimating  fit. 
We  use  the  negative  Poisson  log- likelihood  function  of  the  observation  omitting  the  yi !  term  since  it  is  an 
offset  not  dependent  on  (3: 

(  ) 

ffi,t{P)=  .  (5) 

i=l 


A  small  constant,  y  is  added  to  ensure  numerical  stability.  Notice  that  this  is  the  same  loss  function  that 
would  be  used  if  the  video  sequence  was  assumed  to  only  have  background  content. 

We  then  wish  to  estimate  (pt.  We  again  start  by  using  the  negative  Poisson  log-likelihood  as  a  basis  for 
the  loss  function  for  cpt,  but  now  assume  access  to  an  estimate  of  the  background,  j8t. 


-logCp  (ytM A))  =  v>i  “log 


{  .  w 

C^p,-  +Pt.iy‘‘ 
yt,i'- 


(6) 


Assuming  that  the  background  estimate  has  already  been  found,  this  leads  to  the  following  data  fit  function 
for  the  foreground: 


.  ( 

=  <Pi  -yt,i  log 

i=l 


w 


pt 


+1 


(7) 


This  loss  function  comes  from  the  negative  log-likelihood  function  by  subtracting  ^  .  -fiog[yf /!}  - 

which  is  independent  of  cpt.  Again,  a  small  positive  constant  y  is  used  to  ensure  numerical 

stability. 

Finally,  we  include  regularization  penalties,  r/^and  rep.  For  this  application,  we  use  a  total  variation 
penalty  [5,  8],  which  insures  that  the  estimates  are  somewhat  smooth,  as  would  be  expected  in  natural 
images.  This  makes  the  overall  loss  functions  the  following: 


(P)  -  ffi,t  (P)  +  Ti3  1(3  /tv  (8) 

f(P,t  C<P;  P)  =  f(p,t  C(p;  P)  +  Tep  Icp  /tv.  (9) 


where  and  Tep  are  tradeoff  parameters  between  data  fidelity  and  regularization  for  the  background  and 
foreground  respectively.  Notice  how  this  process  essentially  tries  to  find  a  coarse  estimate  for  the  underlying 
scene  in  the  background,  and  then  tries  to  find  a  foreground  which  fine  tunes  this  estimate.  These  loss 
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Algorithm  1  Background  and  Foreground  Estimation 

fort=l . Tdo 

Observe  yt 
fori  =l,...Ni  do 

A,t+i  =argmmp^Bnt{Vff3,ti0i,t'),l3)+Ty  /tv 

...  1P  -A,t  f 

fork  =l,...,iV^do 

=^k.L^i^F 

+1  =  arg  min^g:^  t7 1  ,t ;  A  +1  ],  (p;  + ... 

r^)  /^P  /tv  +  /^P  -^fc,t  /^ 

^,f  j  =SoftThresh(0^'^W  f^J) 

4t^i/ 

.  luL  „  L{^L  luL 


(pt+1  — 

end  for 


-k=i  ^k,t+i 
N2 

k=i  ^k,t+i 


-k=i  ^k,t+i 
N2  ^<P 
k=l  ^k,t+l 
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functions,  as  constructed,  are  convex  which  means  we  can  use  online  convex  optimization  techniques.  We 
will  also  use  the  fact  that  the  background  and  foreground  should  have  different  dynamics  to  help  with 
separation  and  reconstruction. 

The  overall  procedure  is  described  in  algorithm  1.  For  both  of  the  inner  loops,  the  minimization  was 
found  by  using  the  FISTA  algorithm  of  Beck  and  Teboulle  [5].  Additionally,  a  small  amount  of  soft  thresh¬ 
olding  is  applied  to  the  foreground  estimate  at  each  time  step,  to  ensure  that  ambiguous  areas  that  could 
be  considered  either  background  or  foreground  are  removed  from  the  foreground  estimate.  Without  this 
thresholding,  these  ambiguous  areas  would  appear  in  the  foreground  estimate  as  an  underlying  haze.  It  is 
important  to  notice  for  the  background  and  foreground  we  have  two  slightly  different  estimates.  The  values 
denoted  Pt  and  ^  are  the  filtering  estimates,  meaning  that  they  are  reconstructions  for  time  t  using  all  the 
observations  up  to  time  t.  The  values  Pt+i  and  (pt+i  are  the  prediction  values,  meaning  they  use  all  the  data 
up  to  time  t  to  predict  the  observation  at  time  t  +1. 

3.4  Experimental  Results 


Figure  6:  Foreground  and  background  reconstruction  at  t  =250.  The  true  image  (a)  has  foreground  and 
background  content,  and  the  observations  (b)  are  extremely  noisy.  We  form  a  background  estimate  (c)  and 
use  it  to  obtain  a  foreground  estimate  (d).  Notice  the  details  visible  in  the  foreground  estimate  such  as  the 
windows  and  tail  structure  of  the  plane. 


(a)  True  Scene  (b)  Poisson  Observation  (c)  Background  Estimate  (d)  Foreground  Estimate 

Figure  7:  Foreground  and  background  reconstruction  at  t  =925.  Again  notice  how  the  foreground  object 
in  (a)  is  basically  imperceptible  in  the  observations  (b).  By  estimating  the  background  (c)  an  accurate 
foreground  estimate  can  be  constructed  (d) 


To  test  this  method,  we  created  a  data  set  that  featured  an  object  moving  across  a  slowly  varying  back- 
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ground  in  the  following  way: 


1 


Xt  =  e.  +(1 -+(p,t(0))S  e, 

I  t  t 

i=l 


yt  ~Poisson(xt), 


where  ^(p[tit{i)  is  the  indicator  of  pixel  i  being  foreground  or  not  and  ei  is  the  standard  basis  vector  . 
This  process  shows  a  foreground  object  being  translated  through  the  function  on  top  of  a  background 
image  moving  with  dynamics  The  images  were  compiled  by  letting  certain  pixels  be  designated  as 
foreground  object,  and  everything  else  being  background.  Notice,  that  the  algorithm  assumes  every  pixel  is 
the  addition  of  foreground  and  background,  but  the  data  used  is  more  realistic  in  that  each  pixel  is  either  one 
or  the  other. 

Each  image  is  150'><'150,  and  no  pixel  has  mean  value  greater  than  5,  so  the  video  is  extremely  photon 
limited.  For  the  background,  the  true  underlying  dynamics  was  a  subpixel  shift  of  l/50th  of  a  pixel  to  the 
top  left  at  every  time  step.  For  the  foreground  the  true  dynamics  is  a  full  pixel  shift  to  the  top  right  for 
the  first  500  frames  and  bottom  right  for  the  second  500  frames.  The  candidate  dynamic  models  used  for 
the  background  were  subpixel  shifts  of  l/50th  of  a  pixel  shift  in  directions  of  kn/4  for  k  =1,2,.. .8  and 
stationary  =1).  The  foreground  candidate  dynamics  were  full  pixel  shifts  in  the  same  directions  as  well 
as  a  stationary  dynamic. 

Figures  6  and  7  show  examples  of  the  DFS  algorithm  taking  the  series  of  Poisson  observations,  and 
making  accurate  representations  of  the  foreground  and  background.  It  is  especially  important  to  notice 
that  including  the  foreground  and  background  dynamics  allows  for  the  foreground  image  to  become  clear. 
Without  incorporating  dynamics  and  regularization,  the  additional  foreground  image  would  simply  be  the 
transient  errors  of  the  background  estimation.  By  including  the  dynamics  in  the  optimization  process,  the 
systematic  difference  of  the  background  estimate  can  be  found  to  be  the  foreground  object. 


4  Conclusions 

The  research  supported  by  this  grant  resulted  in  several  innovative  methods  and  supporting  theory  for 
photon-limited  sensing  and  surveillance.  We  have  developed  practical  methods  representative  of  the  cur¬ 
rent  state-of-the-art  with  online  code  actively  used  by  the  signal  processing  community.  We  have  also 
developed  theory  that  highlights  the  challenges  of  photon-limited  imaging  in  compressed  sensing  contexts 
and  described  the  potential  benefits  of  using  conventional  imagers  rather  can  compressive  imagers.  In  addi¬ 
tion,  we  have  developed  novel  theory  that  characterizes  the  performance  of  online  learning  methods  which 
learn  and  exploit  underlying  dynamics.  These  fundamental  performance  bounds  relating  to  reconstruction 
accuracy  and  regret  bounds  associated  with  sequential  processing  of  video  frames  guided  the  development 
of  fast  and  novel  computational  techniques,  and  set  the  stage  for  further  advances  in  future  work. 
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